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Abstract 

Serial and parallel algorithms for simulation of tandem queueing 
systems with infinite buffers are presented, and their performance is 
examined. It is shown that the algorithms which are based on a simple 
computational procedure involve low time and memory requirements. 
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1 Introduction 

The simulation of a queueing system is normally an iterative process vi^hich 
involves generation of random variables associated with current events in 
the system, and evaluation of the system state variables when new events 
occur [U El [3l m [5]. In a system being simulated the random variables 
may represent the inter arrival and service time of customers, and determine 
a random routing procedure for customers within the system with non- 
deterministic routing. As state variables, the arrival and departure time of 
customers, and the service initiation time can be considered. 

The methods of generating random variables present one of the main 
issues in computer simulation, which has been studied intensively in the lit- 
erature (see, e.g. [3]). In this paper however, we assume as in [2| that for 
the random variables involved in simulating a queueing system, appropriate 
realizations are available when required, and we, therefore, concentrate only 
on algorithms of evaluating the system state variables from these realiza- 
tions. 

The usual way to represent dynamics of queueing systems is based on 
recursive equations describing evolution of system state variables. Fur- 
thermore, these equations, which actually determine a global structure of 
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changes in the state variables consecutively, have proved to be useful in 
designing efficient simulation algorithms [B El HI E] • 

In this paper we apply recursive equations to the development of al- 
gorithms for simulation of tandem queueing systems with infinite buffers 
capacity. A serial algorithm and parallel one designed for implementation 
on single instruction, multiple data parallel processors are presented. These 
algorithms are based on a simple computational procedure which exploits 
a particular order of evaluating the system state variables from the recur- 
sive equations. The analysis of their performance shows that the algorithms 
involve low time and memory requirements. 

The rest of the paper is organized as follows. In Section 2, we give 
recursive equations based representation for tandem systems. These repre- 
sentations are used in Section 3 to design both serial and parallel simulation 
algorithms. Time and memory requirements of the algorithms are also dis- 
cussed in this section. Finally, Section 4 includes two lemmae which offer a 
closer examination of the performance of the parallel algorithm. 

2 Tandem Queues with Infinite Buffers 

To set up the recursive equations that underlie the development and analysis 
of simulation algorithms in the next sections, consider a series of M single 
server queues with infinite buffers. Each customer that arrives into the 
system is initially placed in the buffer at the 1st server and then has to 
pass through all the queues one after the other. Upon the completion of 
his service at server i , the customer is instantaneously transferred to queue 
i + l,i = l,...,M — 1, and occupies the (i + 1) st server provided that it 
is free. If the customer finds this server busy, he is placed in its buffer and 
has to wait until the service of all his predecessors is completed. 

Denote the time between the arrivals of jth customer and his predecessor 
by TQj , and the service time of the j th customer at server i by Tij , i = 
1, . . . , M, j = 1,2,... . Furthermore, let Dqj be the jth arrival epoch to 
the system, and Dij be the jth departure epoch from the ith. server. We 
assume that Tij > are given parameters, whereas Dij are unknown state 
variables. Finally, we define Dio = for all i = 0, . . . ,M, and D^ij = 
for j = 1,2,.... 

With the condition that the tandem queueing system starts operating 
at time zero, and it is free of customers at the initial time, the recursive 
equations representing the system dynamics can readily be written as [H Oil] 

Dij = (A-ij V Aj-i) + Tij, (1) 

where V denotes the maximum operator, i = 0, 1, . . . , M, j = 1,2, . . . . We 
can also rewrite ([1]) in another form intended to provide the basic represen- 
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tation for parallel algorithms, as 

Bij = Di^ij V Aj-i 

where Bij stands for the jth initiation of service at server i, i = 0,1, . . . , M , 
3 = 1,2,.... 

3 Algorithms for Tandem Queueing System Sim- 
ulation 

The simulation algorithms presented in this section are based on the equa- 
tions ([T]) and ([2]) with indices being varied in a particular order which is 
illustrated in Fig. 1(a). Clearly, at each iteration k the variables Dij with 
i + j = k, k = 1,2,..., are evaluated. They form diagonals depicted in 
Fig. 1(a) by arrows, for each diagonal the direction of arrows indicates the 
order in which the variables should be evaluated within their associated it- 
erations. Note that to obtain each element of a diagonal, only two elements 
from the preceding diagonal are required, as Fig. 1(b) shows. 
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(a) The simulation schematic for a tandem system with M = 2, N = 5. 
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Figure 1: Simulation procedure schematics. 

By applying the computational procedure outlined above, both serial 
and parallel simulation algorithms which provide considerable savings in 
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time and memory costs may be readily designed. Specifically, the next se- 
rial algorithm is intended for simulation of the first A'^ customers in a tandem 
queueing system with M servers. 

Algorithm 1. 

For each A; = 1, . . . , M + A^, do 

for j = max(l, k — M), max(l, k — M) + 1, . . . , min(fe, A'^) , 
compute Dk-jj = Dk-j-ij V Dk-jj-i + Tk-jj ■ 

Based on Fig. 1(a) as an illustration, it is not difficult to calculate the 
total number of arithmetic operations which one has to perform using Algo- 
rithm 1. Since each variable Dij can be obtained using one maximization 
and one addition, all variables with i = 0, . . . , M , and j = 1, . . . N , require 
2(M + 1)A^ operations without considering index manipulations. 

Note that the order in which the variables Dij are evaluated within 
each iteration is essential for reducing memory used for computations. One 
can easily see that only 0(min(M + 1, N)) memory locations are actually 
required with this order. To illuminate the memory requirements, let us 
represent Algorithm 1 in more detailed form as 

Algorithm 2. 

Set di = 0, z = -1,0, . . . ,min(M+ 1, AT). 
For each = 1, . . . , M + A^, do 

for j = max(l, A; - M),max(l, A; - M) + 1, . . . ,min(A;, A^), 
set dk-j < — dk-j-i V dk-j + Tk-jj ■ 

In Algorithm 2, the variable dj serves all the iterations to store current 
values of D^j for all j = 1 , . . . , A^ . Upon the completion of the algorithm, 
we have for server i the ATth departure time saved in di, i = 0,1,... ,M . 

Finally, we present a parallel algorithm for tandem system simulation 
which is actually a simple modification of Algorithm 1. 

Algorithm 3. 

For each A; = 1, . . . , M + A^, do 

in parallel, for j = max(l, k — M), max(l, k — M) + 1, . . . , min(A;, N) , 

compute Bk-jj = Dk-j-ij V Dk-jj-i ; 
in parallel, for j = max(l, k — M), max(l, k — M) + 1, . . . , min(A;, N) , 

compute Dk-jj = Bk-jj + Tk-jj ■ 

As in the case of Algorithm 1, we may conclude that Algorithm 3 en- 
tails 0(2min(M-|- 1,N)) memory locations. Furthermore, it is easy to 
understand that Algorithm 3 requires the performance of 2(M -|- N) paral- 
lel operations provided P > min(M+ 1, A^) processors are used. Otherwise, 
if there are P < niin(Af + 1, A^) processors available, one has to rearrange 
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computations so as to execute each iteration in several parallel steps. In 
other words, all operations within an iteration should be sequentially sepa- 
rated into groups of P operations, assigned to the sequential steps. We will 
discuss time requirements and speedup of the algorithm in this case in the 
next section. 



4 Performance Study of the Parallel Simulation 
Algorithm 

We now turn to the performance evaluation of Algorithm 3 with respect 
to the number of parallel processors. Note that the results of this section 
are obtained by considering the time taken to compute only the state vari- 
ables Dij . In other words, in our analysis we ignore the time required for 
computing indices, allocating and moving data, and synchronizing proces- 
sors, which in general can have an appreciable effect on the performance of 
parallel algorithms. 

Lemma 1. To simulate the first N customers in a tandem queue with M 
servers, Algorithm 3 using P processors requires the time 



Tp = [2M + 2N + 2 



Li - 1 



{L2-P)\, 



(3) 



where Li = min(M -|- 1, AT) , L2 = max(M + 1,N). 



Proof. We start our proof with evaluating the exact number of parallel op- 
erations to be performed when P processors are available. As it easy to see, 
at each iteration k, k = 1, . . . , M + N , the algorithm first carries out in 
parallel a fixed number of maximizations, and then does the same number of 
additions. Denote this number by 1^ ■ It follows from the above description 
of the algorithm that the numbers Ik, k = 1,. . . ,M + N , form the sequence 
with elements 



1, 2, . . . , Li — 1, Li,...,Li, Li — l,Li — 2, 



1. 



L2 — Li + 1 times 



Since l^ parallel operations may be performed using P processors in the 
time [{Ik — 1)/PJ -1-1, for the entire algorithm we have the total time 



M+N 



Tp = 2^ 



k=l 



P 



k=l 



k-1 



+ 1] +2(L2-Li-1) 



Li-1 



(4) 
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To calculate Tp , let us first consider the sum 
k-l 



k=l 



P 



P times P times 



Li-1 



P times 



Li-l 



Li - P[{Li - 1)/PJ times 

_ P 
~ 2" 

Substitution of this expression into ([U, and trivial algebraic manipulations 
give 





( 


Li-1 


- 1) + (^1 - P 


Li-1 


) 


Li-1 


P 




P 


P 




P 



Tp = 2(Li+L2-l)+2 



Li + Lo - 1 - - 



Li-l 
P 



■ (5) 



Finally, since Li + L2 — 1 = M + N , and P[L/PJ ~ L as L — ?> 00, we 
conclude that 



Tp = 0{2M + 2N + 2 



Li-l 
P 



{L2 - P) 



as M,N ^ 00. □ 



Note that in two critical cases with P = 1 and P > min(M + 1, A^) , the 
order produced by ^ coincides with the exact times respectively equaled 
2(M + 1)7V and 2{M + N). 

Lemma 2. For a tandem system with M servers, Algorithm 3 using P 
processors achieves the speedup 

M + 1 



Sp = 



I+IM/P 

Proof. To evaluate the speedup which is defined as 

Sp = Ti/Tp, 



as N ^ 00. 



(6) 
(7) 



first note that Ti = 2(M + 1)A^. 

Let us examine Tp. Assuming M to be fixed, and — )• 00, we obtain 

Li = min(M + 1,N) = M + 1, L2 = max(M + l,N) = N. 

In that case, from ([5]) we have 



Tp = 2(M + N) + 2 



M 



M + N - P - P 



M 



~ 2N{1 + [M/P\ ) as iV ^ 00. 

Finally, substitution of this expression together with that for Ti into ([7]) 
leads us to the desired result. □ 
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Corollary 3. For a tandem system with M servers, Algorithm 3 using 

P = M + 1 processors achieves linear speedup as the number of customers 
N ^ oo. 

Proof. It follows from 1^ that with P = M the speedup Sp = 0{P) as 

^ oo. □ 
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